AI029
Reinforcement Learning: An Introduction
Temporal-Difference Learning
Learning Objectives
- Define the TD(0) update rule and its relation to the Bellman equation.
- Contrast TD learning with Monte Carlo methods regarding bias, variance, and online updates.
- Explain the concept of bootstrapping and its role in TD prediction.
- Introduce the Sarsa (on-policy) and Q-learning (off-policy) algorithms for control.
- Analyze the advantages of TD learning in environments without a transition model.